% LaTeX source for textbook ``How to think like a computer scientist''
% Copyright (C) 1999  Allen B. Downey

% This LaTeX source is free software; you can redistribute it and/or
% modify it under the terms of the GNU General Public License as
% published by the Free Software Foundation (version 2).

% This LaTeX source is distributed in the hope that it will be useful,
% but WITHOUT ANY WARRANTY; without even the implied warranty of
% MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
% General Public License for more details.

% Compiling this LaTeX source has the effect of generating
% a device-independent representation of a textbook, which
% can be converted to other formats and printed.  All intermediate
% representations (including DVI and Postscript), and all printed
% copies of the textbook are also covered by the GNU General
% Public License.

% This distribution includes a file named COPYING that contains the text
% of the GNU General Public License.  If it is missing, you can obtain
% it from www.gnu.org or by writing to the Free Software Foundation,
% Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.


\chapter{Linked lists}
\label{list}
\index{list}

\section{Embedded references}
\index{reference}
\index{embedded reference}
\index{reference!embedded}
\index{linked list}
\index{list!linked}
\index{node}
\index{cargo}

We have seen examples of attributes that refer to other objects, which
we called {\bf embedded references} (see Section~\ref{embedded}).  A
common data structure, the {\bf linked list}, takes advantage of this
feature.

Linked lists are made up of {\bf nodes}, where each node contains a
reference to the next node in the list.  In addition, each node
contains a unit of data called the {\bf cargo}.

A linked list is considered a {\bf recursive data
structure} because it has a recursive definition.

\begin{quote}
A linked list is either:
\begin{itemize}

\item the empty list, represented by {\tt None}, or

\item a node that contains a cargo object and a reference
to a linked list.

\end{itemize}

\end{quote}

\index{recursive data structure}
\index{data structure!recursive}

Recursive data structures lend themselves to
recursive methods.


\section{The {\tt Node} class}
\index{Node class}
\index{class!Node}

As usual when writing a new class, we'll start with the
initialization and {\tt \_\_str\_\_} methods so that we
can test the basic mechanism of creating and displaying the new
type:

\beforeverb
\begin{verbatim}
class Node:
  def __init__(self, cargo=None, next=None):
    self.cargo = cargo
    self.next  = next

  def __str__(self):
    return str(self.cargo)
\end{verbatim}
\afterverb
%
As usual, the parameters for the initialization method are optional. By
default, both the cargo and the link, {\tt next}, are set
to {\tt None}.

The string representation of a node is just the string representation
of the cargo.  Since any value can be passed to the {\tt str}
function, we can store any value in a list.

To test the implementation so far, we can create a {\tt Node}
and print it:

\beforeverb
\begin{verbatim}
>>> node = Node("test")
>>> print node
test
\end{verbatim}
\afterverb
%
To make it interesting, we need a list with more than
one node:

\beforeverb
\begin{verbatim}
>>> node1 = Node(1)
>>> node2 = Node(2)
>>> node3 = Node(3)
\end{verbatim}
\afterverb
%
This code creates three nodes, but we don't have a list yet
because the nodes are not {\bf linked}.  The state diagram
looks like this:

\beforefig
\centerline{\psfig{figure=illustrations/link1.eps}}
\afterfig

To link the nodes, we have to make the first node refer to the
second and the second node refer to the third:

\beforeverb
\begin{verbatim}
>>> node1.next = node2
>>> node2.next = node3
\end{verbatim}
\afterverb
%
The reference of the third node is {\tt None}, which indicates that
it is the end of the list.  Now the state diagram looks like this:

\beforefig
\centerline{\psfig{figure=illustrations/link2.eps}}
\afterfig

Now you know how to create nodes and link them into lists.  What
might be less clear at this point is why.


\section{Lists as collections}
\index{collection}

Lists are useful because they provide a way to assemble multiple
objects into a single entity, sometimes called a {\bf collection}.  In
the example, the first node of the list serves as a reference to the
entire list.

\index{list!printing}
\index{list!as argument}

To pass the list as an argument, we only have to pass a
reference to the first node.  For example, the function {\tt printList}
takes a single node as an argument.  Starting with the head of the
list, it prints each node until it gets to the end:

\beforeverb
\begin{verbatim}
def printList(node):
  while node:
    print node,
    node = node.next
  print
\end{verbatim}
\afterverb
%
To invoke this function, we pass a reference to the
first node:

\beforeverb
\begin{verbatim}
>>> printList(node1)
1 2 3
\end{verbatim}
\afterverb
%
Inside {\tt printList} we have a reference to the first node
of the list, but there is no variable that refers to the other
nodes.  We have to use the {\tt next} value from each node
to get to the next node.

To traverse a linked list, it is common to use a loop variable like
{\tt node} to refer to each of the nodes in succession.

\index{loop variable}
\index{list!traversal}
\index{traverse}

This diagram shows the nodes in the list and the values that
{\tt node} takes on:

\beforefig
\centerline{\psfig{figure=illustrations/link3.eps}}
\afterfig

\begin{quote}
{\em By convention, lists are often printed in brackets with commas
between the elements, as in {\tt [1, 2, 3]}.  As an exercise, modify
{\tt printList} so that it generates output in this format.}
\end{quote}


\section{Lists and recursion}
\label{listrecursion}
\index{list!traverse recursively}
\index{traverse}

It is natural to express many list operations using recursive methods.
For example, the following is a recursive algorithm for printing a list
backwards:

\begin{enumerate}

\item Separate the list into two pieces: the first node (called
the head); and the rest (called the tail).

\item Print the tail backward.

\item Print the head.

\end{enumerate}

Of course, Step 2, the recursive call, assumes that we have a way of
printing a list backward.  But if we assume that the recursive
call works---the leap of faith---then we can convince ourselves that
this algorithm works.

\index{leap of faith}
\index{list!printing backwards}

All we need are a base case and a way of proving that for
any list, we will eventually get to the base case.  Given the
recursive definition of a list, a natural base case is
the empty list, represented by {\tt None}:

\beforeverb
\begin{verbatim}
def printBackward(list):
  if list == None: return
  head = list
  tail = list.next
  printBackward(tail)
  print head,
\end{verbatim}
\afterverb
%
The first line handles the base case by doing nothing.  The
next two lines split the list into {\tt head} and {\tt tail}.
The last two lines print the list.  The comma at the end of the
last line keeps Python from printing a newline after each node.

We invoke this function as we invoked {\tt printList}:

\beforeverb
\begin{verbatim}
>>> printBackward(node1)
3 2 1
\end{verbatim}
\afterverb
%
The result is a backward list.

You might wonder why {\tt printList} and {\tt printBackward} are
functions and not methods in the {\tt Node} class.  The reason is that
we want to use {\tt None} to represent the empty list and it is not
legal to invoke a method on {\tt None}.  This limitation makes it
awkward to write list-manipulating code in a clean object-oriented
style.

Can we prove that {\tt printBackward} will always terminate?   In other
words, will it always reach the base case?  In fact, the answer
is no.  Some lists will make this function crash.


\section{Infinite lists}
\index{infinite list}
\index{list!infinite}
\index{loop!in list}
\index{list!loop}

There is nothing to prevent a node from referring back to
an earlier node in the list, including itself.  For example,
this figure shows a list with two nodes, one of which refers
to itself:

\beforefig
\centerline{\psfig{figure=illustrations/link4.eps}}
\afterfig

If we invoke {\tt printList} on this list, it will loop forever.
If we invoke {\tt printBackward}, it will recurse infinitely.
This sort of behavior makes infinite lists difficult to work
with.

Nevertheless, they are occasionally useful.  For example, we
might represent a number as a list of digits and use an infinite
list to represent a repeating fraction.

Regardless, it is problematic that we cannot prove that {\tt printList}
and {\tt printBackward} terminate.  The best we can do is the
hypothetical statement, ``If the list contains no loops, then these
functions will terminate.''  This sort of claim is called a {\bf
precondition}.  It imposes a constraint on one of the arguments and
describes the behavior of the function if the constraint is satisfied.
You will see more examples soon.

\index{precondition}


\section{The fundamental ambiguity theorem}
\index{ambiguity!fundamental theorem}
\index{theorem!fundamental ambiguity}

One part of {\tt printBackward} might have raised
an eyebrow:

\beforeverb
\begin{verbatim}
    head = list
    tail = list.next
\end{verbatim}
\afterverb
%
After the first assignment, {\tt head} and {\tt list} have the same
type and the same value.  So why did we create a new variable?

The reason is that the two variables play different roles.  We think
of {\tt head} as a reference to a single node, and we think of
{\tt list} as a reference to the first node of a list.  These
``roles'' are not part of the program; they are in the mind of the
programmer.

\index{variable!roles}
\index{role!variable}

In general we can't tell by looking at a program what role a
variable plays.
This ambiguity can be useful, but it can also make programs
difficult to read.  We often use variable names like {\tt node}
and {\tt list} to document how we intend to use a variable and
sometimes create additional variables to disambiguate.

We could have written {\tt printBackward} without {\tt head}
and {\tt tail}, which makes it more concise but possibly
less clear:

\beforeverb
\begin{verbatim}
def printBackward(list) :
  if list == None : return
  printBackward(list.next)
  print list,
\end{verbatim}
\afterverb
%
Looking at the two function calls, we have to remember that {\tt
printBackward} treats its argument as a collection and {\tt print}
treats its argument as a single object.

The {\bf fundamental ambiguity theorem} describes the ambiguity
that is inherent in a reference to a node:

\begin{quote}
{\bf A variable that refers to a node might treat the node as a single
object or as the first in a list of nodes.}
\end{quote}



\section{Modifying lists}
\index{list!modifying}
\index{modifying lists}

There are two ways to modify a linked list.  Obviously, we can change
the cargo of one of the nodes, but the more interesting operations are
the ones that add, remove, or reorder the nodes.

As an example, let's write a function that removes the second
node in the list and returns a reference to the removed node:

\beforeverb
\begin{verbatim}
def removeSecond(list):
  if list == None: return
  first = list
  second = list.next
  # make the first node refer to the third
  first.next = second.next
  # separate the second node from the rest of the list
  second.next = None
  return second
\end{verbatim}
\afterverb
%
Again, we are using temporary variables to make the code more
readable.  Here is how to use this function:

\beforeverb
\begin{verbatim}
>>> printList(node1)
1 2 3
>>> removed = removeSecond(node1)
>>> printList(removed)
2
>>> printList(node1)
1 3
\end{verbatim}
\afterverb
%
This state diagram shows the effect of the operation:

\beforefig
\centerline{\psfig{figure=illustrations/link5.eps}}
\afterfig

What happens if you invoke this function and pass a list with only one
element (a {\bf singleton})?  What happens if you pass the empty list
as an argument?  Is there a precondition for this function?  If so, fix
the function to handle a violation of the precondition in a reasonable
way.

\index{singleton}


\section{Wrappers and helpers}
\index{wrapper function}
\index{function!wrapper}
\index{helper function}
\index{function!helper}

It is often useful to divide a list operation into
two functions.  For example, to print a list
backward in the
format {\tt [3 2 1]} we can use the
{\tt printBackward} function to print {\tt 3 2 1} but we need
a separate function to print the brackets.
Let's call it {\tt printBackwardNicely}:

\beforeverb
\begin{verbatim}
def printBackwardNicely(list) :
  print "[",
  printBackward(list)
  print "]",
\end{verbatim}
\afterverb
%
Again, it is a good idea to check functions like this to see
if they work with special cases like an empty list or
a singleton.

\index{singleton}

When we use this function elsewhere in the program, we
invoke {\tt printBackwardNicely} directly, and it invokes
{\tt printBackward} on our behalf.  In that sense,
{\tt printBackwardNicely} acts as a {\bf wrapper}, and it uses
{\tt printBackward} as a {\bf helper}.


\section {The {\tt LinkedList} class}
\index{LinkedList}
\index{class!LinkedList}

There are some subtle problems with the way we have been
implementing lists.  In a reversal of cause and effect, we'll
propose an alternative implementation first and then explain what
problems it solves.

First, we'll create a new class called {\tt LinkedList}.  Its
attributes are an integer that contains the length of the list
and a reference to the first node.  {\tt LinkedList} objects
serve as handles for manipulating lists of {\tt Node} objects:

\beforeverb
\begin{verbatim}
class LinkedList :
  def __init__(self) :
    self.length = 0
    self.head   = None
\end{verbatim}
\afterverb
%
One nice thing about the {\tt LinkedList} class is that it provides
a natural place to put wrapper functions like
{\tt printBackwardNicely}, which we can make a
method of the {\tt LinkedList} class:

\beforeverb
\begin{verbatim}
class LinkedList:
  ...
  def printBackward(self):
    print "[",
    if self.head != None:
      self.head.printBackward()
    print "]",

class Node:
  ...
  def printBackward(self):
    if self.next != None:
      tail = self.next
      tail.printBackward()
    print self.cargo,
\end{verbatim}
\afterverb
%
Just to make things confusing, we renamed {\tt printBackwardNicely}.
Now there are two methods named {\tt printBackward}: one in the {\tt
Node} class (the helper); and one in the {\tt LinkedList} class (the
wrapper).  When the wrapper invokes {\tt self.head.printBackward},
it is invoking the helper, because {\tt self.head} is a
{\tt Node} object.

Another benefit of the {\tt LinkedList} class is that it
makes it easier to add or remove the first element of a list.  For
example, {\tt addFirst} is a method for {\tt LinkedList}s; it
takes an item of cargo as an argument and puts it at the beginning of the
list:

\beforeverb
\begin{verbatim}
class LinkedList:
  ...
  def addFirst(self, cargo):
    node = Node(cargo)
    node.next = self.head
    self.head = node
    self.length = self.length + 1
\end{verbatim}
\afterverb
%
As usual, you should check code like this to see if it handles
the special cases.  For example, what happens if the list is initially
empty?


\section {Invariants}
\index{invariant}
\index{object invariant}
\index{list!well-formed}

Some lists are ``well formed"; others are not.  For example, if a list
contains a loop, it will cause many of our methods to crash, so we
might want to require that lists contain no loops.  Another
requirement is that the {\tt length} value in the {\tt LinkedList}
object should be equal to the actual number of nodes in the list.

Requirements like these are called {\bf invariants} because, ideally,
they should be true of every object all the time.  Specifying
invariants for objects is a useful programming practice because it
makes it easier to prove the correctness of code, check the integrity
of data structures, and detect errors.

One thing that is sometimes confusing about invariants is that
there are times when they are violated.  For example, in the
middle of {\tt addFirst}, after we have added the node but
before we have incremented {\tt length}, the invariant is
violated.  This kind of violation is acceptable; in fact, it is
often impossible to modify an object without violating an
invariant for at least a little while.  Normally, we require
that every method that violates an invariant must restore
the invariant.

If there is any significant stretch of code in which the invariant
is violated, it is important for the comments to make that clear,
so that no operations are performed that depend on the invariant.

\index{documentation}


\section{Glossary}
\index{embedded reference}
\index{reference!embedded}
\index{recursive data structure}
\index{data structure!recursive}
\index{linked list}
\index{list!linked}
\index{node}
\index{cargo}
\index{link}
\index{precondition}
\index{invariant}
\index{wrapper}
\index{helper method}
\index{fundamental ambiguity theorem}
\index{singleton}

\begin{description}

\item[embedded reference:] A reference stored in an attribute of
an object.

\item[linked list:] A data structure that implements a collection using
a sequence of linked nodes.

\item[node:] An element of a list, usually implemented as an object
that contains a reference to another object of the same type.

\item[cargo:] An item of data contained in a node.

\item[link:] An embedded reference used to link one object to
another.

\item[precondition:] An assertion that must be true in order for a
method to work correctly.

\item[fundamental ambiguity theorem:] A reference to a list
node can be treated as a single
object or as the first in a list of nodes.

\item[singleton:] A linked list with a single node.

\item[wrapper:] A method that acts as a middleman between a
caller and a helper method, often making the method easier or
less error-prone to invoke.

\item[helper:] A method that is not invoked directly by a caller
but is used by another method to perform part of an operation.

\item[invariant:] An assertion that should be true of an object at
all times (except perhaps while the object is being modified).

\end{description}

